Evaluation of String Normalisation Modules for String-based Biomedical Vocabularies Alignment with AnAGram

نویسندگان

  • Anique van Berne
  • Véronique Malaisé
چکیده

Biomedical vocabularies have specific characteristics that make their lexical alignment challenging. We have built a string-based vocabulary alignment tool, AnAGram, dedicated to efficiently compare terms in the biomedical domain, and evaluate this tool’s results against an algorithm based on Jaro-Winkler’s edit-distance. AnAGram is modular, enabling us to evaluate the precision and recall of different normalization procedures. Globally, our normalization and replacement strategy improves the F-measure score from the edit-distance experiment by more than 100%. Most of this increase can be explained by targeted transformations of the strings with the use of a dictionary of adjective/noun correspondences yielding useful results. However, we found that the classic Porter stemming algorithm needs to be adapted to the biomedical domain to give good quality results in this area.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing Usability of Matching Techniques for Normalising Biomedical Named Entities

String matching plays an important role in biomedical Term Normalisation, the task of linking mentions of biomedical entities to identifiers in reference databases. This paper evaluates exact, rule-based and various string-similarity-based matching techniques. The matchers are compared in two ways: first, we measure precision and recall against a gold-standard dataset and second, we integrate t...

متن کامل

SAMBO and SAMBOdtf Results for the Ontology Alignment Evaluation Initiative 2008

This article describes a base system for ontology alignment, SAMBO, and an extension, SAMBOdtf. We present their results for the benchmark, anatomy and FAO tasks in the 2008 Ontology Alignment Evaluation Initiative. For the benchmark and FAO tasks SAMBO uses a strategy based on string matching as well as the use of a thesaurus. It obtains good results in many cases. For the anatomy task SAMBO u...

متن کامل

Alignment-Based Discriminative String Similarity

A character-based measure of similarity is an important component of many natural language processing systems, including approaches to transliteration, coreference, word alignment, spelling correction, and the identification of cognates in related vocabularies. We propose an alignment-based discriminative framework for string similarity. We gather features from substring pairs consistent with a...

متن کامل

Nonlinear Dynamics of the Rotational Slender Axially Moving String with Simply Supported Conditions

In this research, dynamic analysis of the rotational slender axially moving string is investigated. String assumed as Euler Bernoulli beam. The axial motion of the string, gyroscopic force and mass eccentricity were considered in the study. Equations of motion are derived using Hamilton’s principle, resulting in two partial differential equations for the transverse motions. The equations are ch...

متن کامل

Asymptotic Approximations of the Solution for a Traveling String under Boundary Damping

Transversal vibrations of an axially moving string under boundary damping are investigated. Mathematically, it represents a homogenous linear partial differential equation subject to nonhomogeneous boundary conditions. The string is moving with a relatively (low) constant speed, which is considered to be positive.  The string is kept fixed at the first end, while the other end is tied with the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014